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Abstract. Which components of the singular value decomposition of a signal-plus- 
noise data matrix are most informative for the inferential task of detecting or estimating 
an embedded low-rank signal matrix? Principal component analysis ascribes greater 
importance to the components that capture the greatest variation, i.e., the singular 
vectors associated with the largest singular values. This choice is often justified by 
invoking the Eckart- Young theorem even though that work addresses the problem of 
how to best represent a signal-plus-noise matrix using a low-rank approximation and not 
how to best infer the underlying low-rank signal component. 

Here we take a first-principles approach in which we start with a signal-plus-noise 
data matrix and show how the spectrum of the noise-only component governs whether 
the principal or the middle components of the singular value decomposition of the data 
matrix will be the informative components for inference. 

Simply put, if the noise spectrum is supported on a connected interval, in a sense 
we make precise, then the use of the principal components is justified. When the noise 
spectrum is supported on multiple intervals, then the middle components might be more 
informative than the principal components. 

The end result is a proper justification of the use of principal components in the oft 
considered setting where the noise matrix is i.i.d. Gaussian. An additional consequence 
of our study is the identification of scenarios, generically involving heterogeneous noise 
models such as mixtures of Gaussians, where the middle components might be more 
informative than the principal components so that they may be exploited to extract ad- 
ditional processing gain. In these settings, our results show how the blind use of principal 
components can lead to suboptimal or even faulty inference because of phase transitions 
that separate a regime where the principal components are informative from a regime 
where they are uninformative. We illustrate our findings using numerical simulations 
and a real- world example. 

I. Introduction 
Consider a signal-plus-noise data matrix modeled as 

r 

X = Y t O i u i v? + X, (1) 
i=i 
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where X denotes the n x m noise-only matrix and 5* = Y7i=i ^i u i v f is the rank-r signal 
matrix. Relative to this model, the detection and estimation tasks in signal processing 
and data analysis deal with inferring the presence of and estimating the rank r matrix S 
given X. 

Principal component analysis plays an important role in the setting where r <C min(m, n) 
as described succinctly by Joliffe [HI Chi., pp.1]: 

The central idea of principal component analysis (PCA) is to reduce the 
dimensionality of a data set .... while retaining as much as possible of the 
variation^ present in the data set. The ... first few retain most of the 
variation 1 present in all of the original variables. 

The first few principal components alluded to here refer to the first few singular vectors 
associated with the largest singular values of X. Working with the hypothesis that the 
directions of greatest variation of the data set must reflect (or correlate with) the signal 
content and equipped with the singular value decomposition (SVD) as a technique for 
computing these directions, we can tackle the detection problem in the following manner. 

We start off by computing the SVD of X and plot the singular values {aj}™ =1 in non- 
increasing order. We then estimate the rank of the latent signal matrix S based on the 
rule: 

r — {First i such that gap(z) := <7j — cr nu n. < threshold} — 1, (2) 

where a nu \\ is the largest singular value of the noise-only matrix X which is assumed (in 
the simplest setting) to be known. This rule, and other modifications thereof, yields an 
estimate r for the rank of the latent signal matrix; when r > we have detected a signal 
matrix; see for example [T2l Section 14.5] or [T9(, Section 6.1.3] for classical approaches 
and [13 121 El HH1 [LTJ 123 123 [261 [2D [221 [25] for recent random matrix-theoretic approaches. 

The estimation problem is similarly tackled by computing the truncated SVD of X 
that employs the r (leading or) principal components. This yields a rank r estimate of 
the low-rank signal matrix given by 



S = a i u i v f- ( 3 ) 



i=l 



Does the principal component approach to detection and estimation work? Figure 1(a) 
plots the singular values of a n x m signal-plus-noise data matrix modeled as X = 2uv H + 
X, where the noise-only matrix X has i.i.d. mean zero, variance 1/m Gaussian entries 
and the signal matrix S = 2uv H has rank one. This example, where n = m = 1000, 
illustrates a setting where the gap heuristic in Q for signal-matrix detection "works" 
subject to a specification of the gap size threshold. 



Figure 1(b) plots the n inner-products w)| 2 |}" =1 , where {wi}™ =1 are the left sin- 
gular vectors of X. The quantities {| (uj, u) | 2 |}" =1 (and {|('?j,'y)| 2 |}™ 1 ) are measures of 
informativeness of the singular vectors of X with respect to the singular vectors of the 
latent signal matrix. Clearly, the principal left (also, the right - not plotted here) singular 



1 Emphasis added. 



INFORMATIVE COMPONENT ANALYSIS 



3 



T 




Index 

(a) The singular value spectrum. 
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(b) PCA example. 



Figure 1 . The singular value spectrum of the signal-plus- noise data matrix 
exhibits a continuous-looking portion that may be associated with "noise" 
singular values and a single separated singular value that may be interpreted 
as evidence of a rank-one "signal" matrix buried in the data matrix. 



vector is the most informative component and employing it in an estimate of the signal 
matrix as in ^ is judicious. 

Extending the notion of informativeness further, we might define "informative compo- 
nents" as components of the SVD of the data matrix X that are most correlated with the 
embedded low-rank signal matrix and which consequently best (in a manner to be made 
precise later) facilitate the detection and estimation tasks described earlier. 



For the example in Figure 1(b) , the principal component is the most informative com- 
ponent. In other words, the principal component which captures the greatest variation 
in the data is also the component most correlated with the underlying signal matrix. A 
natural question arises: 

Are the most informative components necessarily the principal components'? 



Figure ^constitutes a counter-example. Figure 2(a) plots the singular values of a signal 



plus-noise data matrix modeled as X = 2uv H + X, where the noise-only matrix X is a 
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(a) MCA example. 
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Figure 2. The singular value spectrum of the signal-plus-noise data ma- 
trix exhibits two continuous-looking portions that may be associated with 
"noise" singular values and a single separated singular value that may be 
interpreted as evidence of a rank-one "signal" matrix buried in the data 
matrix. Note that in contrast to Figure [TJ the principal component is not 
the most informative component. 



mixture of two multivariate Gaussians with different variances that produces a spectrum 
that is supported on two disconnected intervals. The MATLAB code used to generate X 
is listed below so the reader may reproduce Figure [2] 

n = 1000; m = n; 

Sigma = diag( [20*ones (n/10, 1) ; ones (n-n/10 , 1)] ,0) ; °/ temporal covariance 
G = randn(n,m) /sqrt (m) *sqrtm (Sigma) ; 

u = randn(n,l); u = u/norm(u) ; v = randn(m, l)/sqrt(m) ; 
Xtil = 2*u*v' + G; 



The presence of a signal matrix is reflected in the single singular value that separates 
from the continuous looking portions of the spectrum - unlike Figure 1(a), it is in the 
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middle i.e. not associated with the principal component that captures the greatest vari- 
ation. The rule in ^ would return r = here and we would fail to detect the underlying 
signal matrix. 

Figure 2(b) plots the inner-product «)| 2 |}™ = i, where {uj}™ =1 are the left singular 



vectors of X. The quantities «)| 2 |}™ = i (and {\(vi,v )| 2 |}™ 1 ) are measures of infor- 
mativeness of the singular vectors of X with respect to the singular vectors of the latent 
signal matrix. Clearly, the principal left (also, the right - not plotted here) singular vector 
is not the most informative component; the middle component is. Employing the princi- 
pal component in an estimate of the signal matrix as in ^ would not be as judicious as 
using the most informative component, which is the middle component here. 

The preceding examples support our assertion that the principal components are not 
necessarily the most informative components and that middle components might some- 
times be more important. The examples also hint at the role played by the spectrum of 
the noise-only matrix X in determining the relative informativeness of the components. 

An additional remark is in order. The Eckart-Young-Mirsky (EYM) theorem JTUJ [23] 
states that for any unitarily invariant norm, the optimal rank r approximation to X n is 
given by Q. This is a statement about optimal representation of the signal-plus-noise 
matrix. It is not a statement about inference on the underlying low-rank signal matrix. 
Thus there is no contradiction between our results and the content of the EYM theorem. 



1.1. Motivation and summary of findings. This work is motivated by the ubiquity 
of principal component analysis (PCA) in data analysis and signal processing and the 
associated importance assigned by practitioners to the leading singular values and vectors 
of the data matrix. 

In emerging applications, such as the collaborative learning, graph mining or bioinfor- 
matics where the data matrix is large, it is infeasible to compute the entire singular value 
decomposition. There are, however, efficient techniques for computing the leading singu- 
lar vectors of a matrix that employ iterative techniques such as the Arnoldi or Lanczos 
iteration [6] and the family of Krylov subspace methods or using randomized techniques 
as in [131 El E E]. 

In these 'big data' applications, researchers often invoke PCA as justification for the 
computation of a small number of leading singular vectors of the data matrix. Arguably, 
what a practitioner who uses these principal components as a starting point in an infer- 
ential detection, estimation or classification procedure is really after are the informative 
components. As we have already seen, the informative components need not be the prin- 
cipal components and may even be the middle components. 

In the latter scenario, computation of the leading singular vectors, regardless of com- 
putational considerations or choice of algorithm, might lead to faulty inference and lead 
a non-specialist down a road to a flawed conclusion that they may present as supported 
by standard PCA derived data analysis. The situation is particularly perilous in biomed- 
ical applications involving high-dimensional data sets where one cannot exclude or reason 
about most informative components by visual inspection. 2 _ ^ 

^http : // www. ny times . com/20 1 1/07/ 19/health/ 19gene .html?pagewanted=all 
"http : //www.nytimes . com/20 1 l/07/08/health/researcli/08genes .html?_r=2&hp 
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A first-principles approach is needed to justify why the principal components might 
be informative for simple, canonical noise models but also for identifying when middle 
components might be informative. This paper is a step in that direction In what follows, 
we provide a complete picture of how the spectrum of X governs the informativeness of 
various components of the SVD of a data matrix X modeled as in (lj. To summarize our 
findings: 

• The informative components correspond to isolated singular values that separate 
from the noise (or continuous looking) component of the spectrum, 

• Principal components are the most informative components when the noise (or 
the continuous looking) component of the spectrum is supported on one interval, 

• Middle components may be informative when the noise component of the spectrum 
is supported on multiple intervals, 

• Heterogeneities in the noise-only matrix can produce a disconnected noise spec- 
trum, 

• It is possible for both principal and middle components to be informative and, 

• It is possible for the middle component to be informative even when the principal 
component is uninformative. 

Our findings will allow the practitioner to better justify, by employing reasoning based 
on the entire spectrum of X, when the use of principal components is warranted (as it is 
for the example in Figure [3]) and when the middle components might be more informative 
as in Figure [2] The next step in this line of inquiry, that is beyond the scope of this paper, 
is the development of efficient computational methods for large data sets that can detect 
and extract informative middle components. 

We conclude by submitting Figure [4] as evidence that our findings describe phenomena 
that might already be present in real-world data sets ^\ that might previously have been 
interpreted differently. Here we have a 438 x 1200 data matrix whose columns contains 
measurements made at a receiver sensor array and some of the past transmitted data 
symbols. The measurements were made over a time period where there were significant 
fluctuations in the noise levels. The fluctuations in the channel transfer function constitute 
the low-rank "signal" here. 

The plot of the singular values in Figure [4] contains clusters of principal and middle 
eigenvalues that separate from the continuous looking portion of the spectrum. Our find- 
ings suggest that these are informative principal and middle components. We hope that 
this work contributes to an increased understanding of the role played by the noise eigen- 
spectrum in shaping the informativeness of various SVD components and a recognition 
that there is much left to understand in terms of low-rank signal extraction from noisy 
data matrices. 

We begin our exposition in Section [i] by examining how the spectrum of X is related 
to the spectrum of X. We utilize the findings in Section [3] to analyze a setting where the 
principal components are informative. In Section [4] we describe a scenario when middle 
components can be informative while Section [5] contains the main results which formalize 
the arguments presented in Sections [3] and |4j We conclude in Section [6] with a discussion 
of which noise models can produce informative middle and principal components. 



4 We thank Dr. James Preisig of the Woods Hole Oceanographic Institution for this dataset. 
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(a) Sample 1. 



(b) Sample 2. 



(c) Sample 3. 
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(d) The singular value spectrum of the training data matrix for the digit "6" is shown 



Figure 3. (a) - (c) Three samples representing the digit "6" from the 
USPS handwritten digits database. Each s-pixel-by-s-pixel training image 
is converted into a n = s 2 x 1 column vector whose elements represent 
grayscale values. The training data matrix is formed by stacking the col- 
umn vectors corresponding to every image in the labeled training data set 
alongside each other, (d) displays on the singular values of the data matrix 



on the left axis. As in Figure 1(a), the singular value spectrum exhibits 



a continuous-looking portion (that may be interpreted as "noise") and a 
separated portion (that may be interpreted as low-rank "signal"), (right 
axis) A plot of a probability of correct classification versus r plot where r is 
the number of left singular vectors of the training set used for classification. 
Note that choosing r based on the singular value "gap separation" heuristic 
yields near-optimal performance. 
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Figure 4. A real- world data set with possibly informative principal and 
middle components. 

2. The eigenvalues and eigenvectors of X 

For expositional simplicity, let us consider the model in with r = 1 and symmetric 
X, so that 

X = S + X, 

where S = 8uu*, for some arbitrary, non-random, unit norm column vector u. We begin 
our investigation by examining how the eigenvalues and eigenvectors of X are related to 
the eigenvalues and eigenvectors of the low-rank signal matrix S. 

Let X = QAQ* be the eigen-decomposition of the noise-only random matrix X (we 
have suppressed the subscript in X n for notational brevity), where A = diag(Ai, . . . , A n ) 
and Q are the eigenvectors of X. We assume that the noise-only random matrix X is 
invariant, in distribution, under orthogonal (or unitary) conjugation. This implies that 
the eigenvectors of X are Haar-distributed and independent of its eigenvalues [151 Th. 
4.3.5]. We will utilize this fact shortly. 

2.1. Eigenvalues of X. The eigenvalues of X + S are the solutions of the equation 

det(zl - (X + S)) = 0. 
Equivalently, for z such that zl — X is invertible, we have 

zi-{x + s) = {zi -x)-{i- {zi - xy x s), 

so that 

det(z/ - {X + S)) = det{zl - X) ■ det(/ - {zl - Xy l S). 
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Consequently, a simple argument reveals that the z is an eigenvalue of X + S and not 
an eigenvalue of X if and only if 1 is an eigenvalue of the matrix (zl — X)~ 1 S. But 
(zl — X)~ 1 S = (zl — X)~ 1 6uu* has rank one, so its only non-zero eigenvalue will equal 
its trace, which in turn is equal to 9u*(zl — X)~ lr u = 9u*Q(zI — A)~ 1 Q*u. 

Let v = Q*u. Then, z is an eigenvalues of X and not an eigenvalue of X if and only if 

H 2 _ i 

z — A, 0" 



Y^L = \ (4) 



i=i 

Let \i n be the "weighted" spectral measure of X, defined by 



\i n = K| 2 <5a; (the fj's are the coordinates of v = Q*u). (5) 
i=i 

Then any z outside the spectrum of X is an eigenvalue of X if and only if 

i=l 

where G^(z) is the Cauchy transform of defined as 

Gfi( z ) = [ —dnix). (7) 

Equation (6« describes the exact relationship between the eigenvalues of X and the 



eigenvalues o 



X and the dependence on the coordinates of the vector v (via the measure 



//„), which we will use shortly 

2.2. Eigenvectors of X. Let u be a unit eigenvector of X + S associated with the 
eigenvalue z that satisfies From the relationship (X + S)u = zu, we deduce that, for 
S = 9 uu* , 

(zl — X)u = Su = 9uu*u = (9u*u).u (because u*u is a scalar), 

implying that u is proportional to (zl — X)~ l u. 
Since u has unit-norm, 

(zi-xy^u 



u 



y/u*(zl - X)- 2 U 



and 



Notice that 



(u «>| 2 = |« 2 = W^-A)- 1 *?) 2 = = 1 

i i u*Q(zI - A)- 2 Q*u f^M g2 f d Mz) ' v ; 

J (z—x) 2 J (z~x) 2 

^% = (10) 



2 — X 



so that we have 
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Equation (|8j) describes the relationship between the eigenvectors of X and the eigen- 
values of X and the dependence on the coordinates of the vector v (via the measure /x n ), 
which we will return to shortly. 

3. When principal components are the most informative components 

We begin our investigation by considering a setting where the informative components 
do indeed correspond to the principal components. The picture we have developed so far 
is that the eigenvalues Zi and the associated eigenvectors U\ of the signal-plus-noise data 
matrix X modeled as X = X + 9uu* satisfy the equations 

G»M) = \, (12a) 



9 

«>| 2 = -^-^7TT> (12b) 



where 

n 



IJ-n v 

2 



x z - Xi 



The expressions in (12) provide insight on how the eigenvalues of X are related to the 
eigenvalues of X. 



Figure |5| considers the n = 5 setting and shows how the expressions in ( 12 ) provide 
insight on the informativeness of the eigenvalues and eigenvectors of X. 



By (12i), the eigenvalues of X correspond to the values of z where the horizontal line 
1/9 in Figure [5] intersects the curve G^ n {z). Since G^ n {z) has poles at the eigenvalues 
of X, all but the largest eigenvalue of X interlace the eigenvalues of X. Consequently, 
A5 < A 5 < A 4 and so on; there is no eigenvalue to the right of Ai and hence Ai can be 
displaced by a greater amount, subject to Ai — Xi < 9. 

Equation (|l2p) reveals that the informativeness of an eigenvector, denoted by Infj := 
\(ui,u)\ 2 , is inversely proportional to the negative slope of the function G^ n {z) evaluated 
at the eigenvalue Z{ = Aj of X associated with the eigenvector Uj. 

3.1. Asymptotic analysis: Eigenvalues. We now place ourselves in the high-dimensional 

setting. Let us assume that as n — > 00, 



n 



n 

i=l 

where fix is a non-random probability and denotes almost sure convergence^ As- 
sume that the largest and smallest eigenvalues of X n converge to b and a, respectively and 
that d(ix(z) > for all z G (a, b) so that the measure is supported on one connected in- 
terval. When X is a sample covariance matrix formed from a matrix with i.i.d. Gaussian 
variables, the eigenvalues will satisfy this condition 



5 The argument holds for other modes of convergence as well so we shall not explicitly specify the mode 
of convergence in the expository sections that follow. 
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The assumed convergence of the eigenvalues to a smooth limiting measure implies that 
as n — > oo, if there were no signal, the eigenvalues would have a continuous looking spec- 
trum as the spacing between successive eigenvalues goes to zero. By the same reasoning, 
when there is a signal, the picture developed in Figure [5] says that all but the leading 
eigenvalue of X will be displaced insignificantly. Thus the n — 1 eigenvalues will retain 
their continuous looking nature and will be tightly packed together. 

As n — > oo, only the largest eigenvalue will exhibit a significant 0(1) deviation relative 
to the corresponding eigenvalue in the noise-only setting (i.e., when S = 0). Since the 
second largest eigenvalue is also displaced insignificantly by a vanishing (with n) amount, 



this manifests as an 0(1) gap in the spectrum as in Figure 1(a) and the use of the 
(principal) gap heuristic in ^ for signal detection is justified. 

We now investigate the fundamental limit of gap heuristic based signal detection. We 
first note that the vector v = Q*u is uniformly distributed on the unit hypersphere, and 
so, in the high-dimensional setting, |t>j| 2 rj 1/n (with high probability) so that 

n \ n 

E\vi\ 2 S Xi =: n n « fix ■= lim 
n^oo ft — 

i=l i=l 

A consequence of /i n — > [ix is that G fln (z) — > G^ x (z). Inverting equation ^ after 
substituting these approximations yields the location of the largest eigenvalue, in the 
n — Y oo limit to be G~ x (l/9). 

Recall that we had assumed that the limiting probability measure of the noise-only 
random matrix fix is compactly supported on a single, connected interval [a,b]. Conse- 
quently, the Cauchy transform G^ x given by ^ is well-defined for z outside [a, b] and 
can tend to a limit G^ x (b + ) which may be bounded, i.e. have G^ x (b + ) < +oo. 

So long as 1/9 < G MX (6 + ), as in Figure |6- (a), we obtain Xi(X) w G~ x (l/9) > b. 
This results in an 0(1) gap between the largest eigenvalue and the edge of the spectrum 
and the gap heuristic will work. However, when 1/9 > G^ Lx (b + ), as in Figure [6]- (b), 
Xi{X) — > Ai(A) = b and the gap heuristic will fail. To summarize: 

Principal gap based signal detection will asymptotically succeed iff 9 > 1/G^ x {b + ). 



3.2. Asymptotic analysis: Eigenvectors. Recall our argument that since Q is isotrop- 
ically random, the vector v = Q*u is uniformly distributed on the unit hypersphere and 
|fi| 2 ~ 1/n (with high probability) in the high-dimensional setting. Consequently, we 
have that 

U, I 2 1 JL 1 

G' (z) = ^ w-T 

^ ] (z - Aj) 2 n ^{z-\ t y 

Since all the n eigenvalues of the noise-only matrix are concentrated on the connected 
interval [a, b], the average spacing between the eigenvalues of X is 0(l/n). Since the 
eigenvalues of X interlace the eigenvalues of A, Aj — Aj = 0(l/n) for all but the largest 
eigenvalue. Hence G'^z) = 0(n) so that {lnf;}™ =2 = -1/G'^ n (z)\ z= ^ =0(l/n). 
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However, Ai — Ai = 0(1), so that G' (Ai) = 0(1) and Inf i = 0(1) implying that the 
principal eigenvector is maximally informative with a non-vanishing (with n) informa- 
tiveness and the use of (|3]) in the estimation of S is justified. We now investigate the 
fundamental limit of principal eigenvector based signal estimation. 

In the asymptotic setting when p n — > px and z = Ai — > p we have that 

d M t) ^ rp^=-G>( P ), 



(z-t) 2 J (p-t) 2 " x 
so that when 1/9 < G flx (b + ), which implies that p > b, we have 

whereas when 1/9 > G flx (b + ) and if px is such that G fJiX has infinite derivative at p = b, 
we have 

|<«x,«)|^0. 

Hence when 9 < l/G Mx (6 + ) and if G' (b + ) = oo, then the all components have vanishing 
(with n) informativeness. To summarize, when 9 > 1/G flx (b + ): 

Principal components are the most informative components when the noise 
eigen-spectrum is contained on a single, connected interval . 

The eigen-spectrum of a Wishart distributed sample covariance matrix with identity co- 
variance satisfies this condition. It is thus a happy coincidence that principal components 
are the most informative components for the simplest noise matrix model. We now con- 
sider the setting where the noise eigen-spectrum is (asymptotically) supported on multiple 
disconnected intervals. 



INFORMATIVE COMPONENT ANALYSIS 



13 




Figure 5. The relationship in ( 12 1) between the eigenvalues of X = 9uu* + 
X and the eigenvalues of X is depicted here. Notice the interlacing of the 
bulk eigenvalues and the emergence of the principal eigen-gap. 
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(a) When 1/9 < G^(b), Xi(X) —tp = G^ 1 ^) and there is a principal eigen-gap. 
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Figure 6. The evolution of the informativeness of the principal eigen-gap 
for different values of 9. In (a), where 9 is large, the principal eigen-gap is 
informative; (b) the principal eigen-gap vanishes when 1/9 = G^(b + ) and 
the signal is undetectable using principal eigen-gap based methods. 
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4. When middle components are informative 

4.1. Asymptotic analysis: eigenvalues. Consider a setting where the eigenvalues of 
the noise-only matrix X are supported on multiple intervals as in Figure [2] This corre- 
sponds to letting the fix obtained in the n — > oo limit of fix„ being supported on t = 2 
intervals. Thus we may model fix as 

where Pi + P2 = 1- Here p := p\ G (0, 1) and the measures fi\ and /12 are non-random 
probability measures supported on [ai, &i] and [02,62], respectively with d/ij(z) > for 
z E (cii,bi) for i = 1,2. We suppose that 02 < 62 < «i < &i, as depicted in Figure 
[8-(a). For k = 0, 1,2, define Cj = J2i=oPi with po := 0, c := and c 2 := 1. We assume 
that for j = 1,2, A nCj ._ 1+ i —4- and that X nC] —> aj. For expositional simplicity we 
assume that ncj is an integer. When X is a sample covariance matrix formed from a 
matrix with Gaussian entries having a covariance matrix with an adequately-separated 
covariance eigen-spectrum then the sample eigenvalues will satisfy this condition. Section 
[6] contains additional examples and elaborates on when the covariance eigen-spectrum in 
separated enough. 

The assumed convergence of the eigenvalues to a smooth limiting measure implies that 
as n — > 00, if there were no signal, the eigenvalues would have a continuous looking 
spectrum as the spacing between successive eigenvalues goes to zero. 

By the same reasoning, when there is a signal, the picture developed in Figure [5] when 
adapted as in Figure [7] for the disjoint interval setting (here £ = 2) reveals (via (12) 



that the leading eigenvalue Ai and an additional, middle, eigenvalue A„, Cl+ i will exhibit a 
significant 0(1) deviation relative to the corresponding eigenvalue in the noise only setting. 
The middle eigenvalue emerges from the bulk spectrum because A nCl — A nci +i = 0(1) as 
n — > 00. 

The remaining n — 2 eigenvalues will be displaced insignificantly and will remain tightly 
packed together, thereby retaining their continuous looking appearance. Consequently, 
there will be two 0(1) eigen-gaps in the spectrum betraying the presence of a low-rank 
signal. Thus, here too, the use of the gap heuristic is justified. 

The emergence of an informative middle eigenvalue in this setting due to the presence 
of a large gap in the noise eigen-spectrum may be viewed as a form of aliasing. 

We now investigate the fundamental limit of gap heuristic based signal detection so we 
might understand when not accounting for the middle eigen-gap might lead to suboptimal 
detection performance. 

As before, we note that the vector v = Q*u is uniformly distributed on the unit hy- 
persphere, and so, in the high-dimensional setting, \vi\ 2 ~ 1/n (with high probability) so 
that 



11 ^ 11 

E\vi\ 2 5 Xi =: fi n w fix ■= hm - } S, 



n— >oo fi 



A consequence of fi n — > fix is that G fln (z) — > G^ x (z). Inverting equation ^ after 
substituting these approximations yields the location of the largest eigenvalue, in the n — > 
00 limit to be G~^(l/8). This results in multiple (i.e. principal and middle) eigenvalues 
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that separate from the bulk spectrum precisely when the functional inverse is multi- valued 
(for a domain outside the region of support). 

Recall our assumption that the limiting probability measure of the noise-only random 
matrix \i x x is compactly supported on I = 2 disjoint intervals {[ai, &i]}f =i - Consequently, 
the Cauchy transform G flx given by (J7l) is well-defined for z outside U? =1 [a i7 bi] and is 
strictly decreasing with increasing z on open intervals (uf =1 [aj, 6j]) c outside the support 
of \ix, as depicted in Figure [8j 

Thus so long as 1/9 < G lMx {pi), \\ — > p\ > b\ and an 0(1) principal eigen-gap will 
manifest. Conversely, if 1/9 > G Mx (6^), as in Figure ^ (b), Ai —> b\ and there will be no 
principal eigen-gap. 

Similarly, if 1/9 < G^ib^) and 1/9 > G Mx (a]"), A nCl+ i = A np+ i — > p 2 > b 2 and an 0(1) 
middle eigen-gap will manifest. Conversely, , as shown in Figure [8]- (c), if 1/9 > 0^(6^") 
then A nCl+ i = X np+ i — > b 2 and there will be no 0(1) middle eigen-gap. However when 
1/9 < G MX (aj~), then A nci+ i = \ np+ i —> a x and technically speaking there is an 0(1) 
middle eigen-gap except that this gap is indistinguishable from the gap in the spectrum 
that appears even when there is no signal. 

Thus principal eigen-gap based signal detection for weak signals (or small 9) fails when- 
ever 9 < 1/G^ x (bf) while middle eigen-gap detection fails whenever 9 < 1/0^(6^). If 
G^ LX {p2) > G^ x (bi), as depicted in Figure[8j then a weak signal that is undetectable using 
the principal eigen-gap heuristic would have remained detectable if the middle eigen-gap 
were considered. This is why the middle eigen-gap in Figure [2] was informative while the 
principal eigen-gap was not. In such settings, detection using only the principal eigen-gap 
detection is suboptimal. 



4.2. Asymptotic analysis: eigenvectors. Equation (12b) reveals that the informative- 
ness of an eigenvector Ui, relative to the signal eigenvector u is given by the expression 

1 1 



\(ui,u)\' 



where 



G'{z) 



1 A 1 



^ J (z - A,;) 2 ~ nj^{z-\Y 

The eigenvalues of the noise-only matrix are concentrated on the disjoint intervals [a\, bi] 
and [a 2 ,&2]- Thus, the average spacing between the successive eigenvalues of X within 
each interval is 0(l/n). Since the eigenvalues of X interlace the eigenvalues of X, Aj — Aj = 
0{l/n) for all but the largest eigenvalue and the middle eigenvalue as in Figure [8]- (a). 
Hence G'Jx) = 0(n) so that {lnf,}« =2 = 1/G^(z)\ z= ~ x% = 0(l/n). 

As before, we note that Ai— Ai = 0(1) so that G'^ n (Xi) = 0(1) and Inf i = 0(1) implying 
that the principal eigenvector is informative with a non- vanishing (with n) informativeness 
and the use of (|3]) in the estimation of S is justified. However, what emerges from the 
picture in Figureu is that since X np — X np +i = 0(1) we have that A np+ i — A„ p+ i = 0(1) and 
by the same argument, lnf np+ i = 0(1) as well. Thus the middle eigenvector associated 
with the middle eigenvalue that exhibits an eigen-gap is also informative. Employing it 
in the estimation of S in ^ would improve estimation performance. 
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Figure 7. The relationship in (12i) between the eigenvalues of X 



6uu* + X and those of X is depicted here when the eigenvalues of X are 
supported on two 0(1) separated intervals. Notice the interlacing of the 
bulk eigenvalues and the emergence of the principal and middle eigen-gap . 
Contrast this to Figure [5] when only the principal eigen-gap emerges. 

Extending this argument further, in the n — > oo limit, when fi n — > fix suppose fix is 
such that G'(z) = — oo for z = bi,b 2 . Then we have that whenever 1/6 < G^ x (bf), 

but if 1/6 > G^ x (bi) as in Figure ^ (b),(c), then 0, and the principal com- 

ponent becomes uninformative. Employing the same argument for the middle eigenvector 
reveals that so long as G^ x (a^) < 1/6 < then A np+ i -^4 p 2 and the correspond- 

ing eigenvector is informative i.e., 

l( ^ +1 ' M)|2 ^^ x 1 (p 2 ) > °- 
When 1/6 > G Mx (6+) as in Figure g-(c), then 

\(ui,u)\ ^> 0, 

and the middle component becomes uninformative. Evidently, if 1/(7^(6^) > 1/G^ x (bf) 
as in Figure [8] then the middle eigenvector will stay informative for a regime of small 6 
where the principal eigenvector is uninformative. More generally, if both the principal and 
the middle eigenvectors are informative then principal eigenvector will be more informative 
if — 1/G'^ x (pi) > —1/G'^(p2) and vice versa. This is determined by the structure of the 
noise spectrum. To summarize: 

• Principal gap based signal detection will asymptotically succeed iff 6 > 1/ G^ x (bf), 

• Middle gap based detection will asymptotically succeed despite principal gap based 
detection failing whenever 0^(6^) > G^ x (bf). 

• The eigenvectors associated with principal or middle eigenvalues that exhibit an 
eigen-gap will be informative 

• The eigenvectors will be uninformative when the eigen-gap vanishes. 

The emergence of informative middle eigenvalues and eigenvector whenever there is a gap 
in the noise eigen-spectrum may be viewed as a form of signal (subspace) aliasing. 
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(c) When 1/0 > G„ x (bt), Ai -> 6 X . Since 1/0 = G^ x {b+) 7 X np+X -> 6 2 . 

Figure 8. The evolution of the informativeness of the principal and the 
middle eigen-gaps for different values of 9. In (a), where 9 is large, both 
the principal and middle eigen-gaps are informative; (b) the principal eigen- 
gap vanishes but the middle eigen-gap persists; (c) both the principal and 
middle eigen-gaps vanishes and the signal is undetectable using eigen-gap 
based methods. The important point to note here is that the middle eigen- 
gap reveals the presence of a signal even when the principal eigen-gap does 
not. 
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5. Main results 

5.1. Eigenvalues and Eigenvectors. Let X n be an n x n symmetric (or Hermitian) 
random matrix whose ordered eigenvalues we denote by Ai(X n ) > • • • > \ n (X n ). Let jix n 
be the empirical eigenvalue distribution, i.e., the probability measure defined as 

1 n 

U i=l 

Assume that the probability measure [ix n converges almost surely weakly as n — > oo, 
to a non-random compactly supported probability measure \ix that is supported on £ 
disjoint intervals so that 

i 

j=i 

where for j = !,...,£, the measures (J,j(x) is a non-random probability measure are 
supported on [aj,bj] with d[ij(z) > for z G (dj,bj) and an < be < ae-i < b^-\ < . . . < 
di < b±. Define = J2 i=0 Pi with p := 0, c := and q := 1. We assume that for 
rnc 3 _i]+i - 1 ^ bj and that A|- raCj ] —4 dj, where [ncj_i] denotes the smallest 
integer greater than or equal to ncj-i. 

For a given r > 1, let > ■ • ■ > 9 r be deterministic non-zero real numbers, chosen 
independently of n. For every n, let P n be an n x n symmetric (or Hermitian) random 
matrix having rank r with its r non-zero eigenvalues equal to 9±, . . . , 9 r . 

Recall that a symmetric (or Hermitian) random matrix is said to be orthogonally in- 
variant (or unitarily invariant) if its distribution is invariant under the action of the 
orthogonal (or unitary) group under conjugation. 

We suppose that X n and P n are independent and that X n , the noise-only, matrix is 
unitarily invariant while the low-rank signal matrix P n is non-random. 

5.1.1. Notation. Throughout this paper, for / a function and c 6 1, we set 

f(c + ):=\imf(z); f(c~) := lim/(*), 

z\.c z~[c 

we also let denote almost sure convergence. The ordered eigenvalues of an n x n 
Hermitian matrix M will be denoted by Ai(M) > • • • > A„(M). Lastly, for a subspace 
F of a Euclidian space E and a vector x G E, we denote the norm of the orthogonal 
projection of x onto F by (x, F). 

Consider the rank r additive perturbation of the random matrix X n given by 

X = X n + P n . 



For this model, we establish the following results. 
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Theorem 5.1 (Eigen-gap phase transition). The eigenvalues of X exhibit the following 
behavior as n — > oo . We have that for each 1 < i < r and 1 < j < £, 

X lnc ._ l]+i (X) ^ { bj if 9, < l/G MX (6t), 



if 6 > 1/G 



^x\ a j-l> 



Here, 



G> x (z) 



z - t 



d/i x (t) for z £ supp nx , 



is the Cauchy transform of nx, G ' ^ ^ a . J') is its functional inverse for G flx (z) for 
z G (bj,cij-i) and a := +oo. 

Proof. The result is obtained by following the approach taken in [U pp. 511-514] for 
proving Theorem 2.1. The key difference is that we are explicitly considering measures 
fix supported on multiple (disconnected) intervals so that the Cauchy transform of \ix 
can have multiple inverses as in Figure |8j For those values of 9 such that G~ x (l/6) is 

multi- valued, as many eigenvalues of X as there are values of z such that z = G~ x (l/9) 
will exhibit the eigen-gaps identified. □ 

Theorem 5.2 (Informativeness of the eigenvectors). Assume throughout that 9 > and 
let G^ x (clq) = +00. Consider i G {1, . . . , r} such that l/0 io G \Jj =1 {G flx (aJ_ 1 ), (frt)). 
For each such i , consider j(i Q ) = j G {1, .. . ,1} such that l/9 io G (Gf lx (aJ_ 1 ),G IJ , x (bj')) 

and let u be a unit-norm eigenvector of X associated with the eigenvalue \\nc-_i~\+%Q- Then 
we have, as n — > oo, 



(a) 



|<«,ker(0 io / n -P n ))| 



2 a.s. 



where p is the limit of \\ncj-{]+ia given by Theorem 5.1 
(b) 

ker(^/ n - P n )) - 



0. 



Proof. The result is obtained by following the approach taken in [U pp. 514-516] for 
proving Theorem 2.2 and accounting for the possibly multi- valued nature of G~ x (l/9). □ 

Theorem 5.3 (Phase transition of eigenvector informativeness). When r = 1, let the sole 
non-zero eigenvalue of P n be denoted by 9. Consider j G {1, . . . ,£} such that 
1 



- i (G^ia^i), G MX (&+)), and G'^ x (bj) = -oo and G^a^) 



-oo. 



For each n, let u be a unit-norm eigenvector of X associated with \\ nCj _{\-\-i- Then we 
have 

as n — > oo. 
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Proof. The result is obtained by following the approach taken in [U pp. 516-517] for 
proving Theorem 2.3 and accouting for the possibly multi- valued nature of G~ x (l/9). □ 

The following proposition allows to assert that in many classical matrix models, such 
as Wigner or Wishart matrices, the above phase transitions actually occur with a finite 
threshold. 

Proposition 5.4 (Edge density decay condition and the phase transition). Assume that 
the limiting eigenvalue distribution fix, supported on £ disjoint intervals, has a density / Mx 
with a power decay at bj for j = 1, . . . , £, i.e., that, as t — > bj with t < bj, f^ x (t) ~ c(bj—t) a 
for some exponent a > — 1 and some constant c. Then: 

Gn x (bt) < oo -<=>- a > and G' /JiX (bj~) = — oo a < 1. 

Similarly, if f^ x has a power decay at dj-i for j = 2, . . . ,£, i.e., that, as t — > Oj_i with 
t > dj-i, ffj, x (t) ~ c(t — aj^i) a for some exponent a > — 1 and some constant c. Then 

^W( a 7-i) < 00 < " > « > and ^7i X ( a 7-i) = ~°° a — 



Theorem 5.1 describes the fundamental limits of eigen-gap based signal detection. Prin- 
cipal eigen-gap detection will fail whenever < \/G^ x {bl). If G> x (6^) > G^ lx (bf) for 
j = 2, . . . , £ then principal eigen-gap detection will be suboptimal as the middle eigen-gaps 
will reveal the presence of a low-rank signal even when the principal eigen-gap does not. 



Theorem 5.2 shows that whenever there is an eigen-gap, the corresponding eigenvectors 



will be informative. Theorem 5^ provides insight on the fundamental limits of low-rank 
signal matrix estimation. 



5.2. Singular values and singular vectors. Let X n be an n x m (n < m, without 
loss of generality) random matrix whose ordered singular values we denote by <Ji(X n ) > 
• • • > o~ n (X n ). Let /ix n be the empirical singular value distribution, i.e., the probability 
measure defined as 

1 - 

i=i 

As before, assume that the probability measure nx n converges almost surely weakly, as 
n — > oo, to a non-random compactly supported probability measure fix that is supported 
on £ disjoint intervals so that 

i 

3=1 

where for j = !,...,£, the measures fij(x) is a non-random probability measure are 
supported on [aj,bj] with dfij(z) > for z G (aj,bj) and ai < bi < a^i < b e _i < . . . < 
a\ < b\. Define Cj = J2l=oPi with po := 0, Co := and q := 1. We assume that for 
j — 1, . . . iT|-„ C; ._ 1 -| +1 ^> bj and that o\ nCi -\ a^. As before, we use [ncj_i] to denote 
the smallest integer greater than or equal to ncj-i. 

For a given r > 1, let Q\ > • • • > 9 r be deterministic non-zero real numbers, chosen 
independently of n. For every n, let P n be an n x m matrix having rank r with its r 
non-zero singular values equal to 9%, . . . , & r . We suppose that X n and P n are independent 
and that X n , the noise-only matrix is bi-unitarily invariant while the low-rank signal 
matrix P n is deterministic. Recall that a random matrix is said to be bi- orthogonally 
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invariant (or bi-unitarily invariant) if its distribution is invariant under multiplication on 
the left and right by orthogonal (or unitary) matrices. Alternately, if P n has isotropically 
random right (or left) singular vectors then, then X n need not be unitarity invariant 
under multiplication on the right (or left, respc.) by orthogonal or unitary matrices. 
Equivalently, X n can have deterministic right and left singular vectors while P n can have 
isotropically random left and right singular vectors and we would get the same result 
stated shortly. 

Consider the rank r additive perturbation of the random matrix X n given by 



X n + P n 



where 



t=l 



and {ui}\ =1 and {v iYi=i are the left an d right singular vectors, respectively of P n . 
For this model, we establish the following results. 

Theorem 5.5 (Largest singular value phase transition). The singular values of X exhibit 
the following behavior as n,m n — > oo and n/m n — > c. . We have that for each 1 < i < r 
and 1 < j < I, 



a 



[ncj_i]+i 



(X) 



dj-i 



zfef<l/D, x (bt), 
V%>l/D llx (aj_ 1 ), 



where D„ x , the D-transform of fix defined by 



z 2 -t 2 



dfix(t) 







X 









z , . . 1 — c 



t 2 



z 



for zi U e j=1 [aj,b 



and ^ a ^(-) will denote its functional inverse on (bj, Oj_i) with a® 



-oo. 



Proof. The result is obtained by following the approach taken in [51 pp. 127-129] for 
proving Theorem 2.9 and accounting for the possibly multi- valued nature of the D~ x [-). 
The key ingredient of the proof is the recognition that the non-zero, positive eigenvalues 
of 

ELi QiUiV- 



' 


X' 




" 


X' 




X* 


0_ 




X* 





+ 



LEI=i^< o 

are precisely the singular values of X. Thus adopting the approach outlined in Section 2] 
while taking into account the structured rank 2r perturbation gives us the stated result. 

□ 

Theorem 5.6 (Informativeness of singular vectors). Assume throughout that 9 > and let 
D vx( a o) = +°°- Consider i e {l,...,r} such that 1/9 2 Q G \Jj=i( D iJ.x( a i-i)i D ^xi^j) J- 
For each such i , consider j(i ) = j 6 {!,...,/} such that l/0 2 o G (Z^ Atx (aJ_ 1 ),D Atx (6^ h )) 
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and let u and v be unit-norm left and right singular vectors of X associated with the 
singular value 5[ n cj_i]+i - Then we have, as n — > oo, 



a) 



b) 



\(u, Span{^; 9 { = 9 io })\ 



(13) 



(14) 



where p is the limit of a io given by Theorem 5.5 and px = cpx + (1 ~ c )^o and f or an V 
probability measure p, 

M z ) ■= J ^h^Mt)- (is) 

c) Furthermore, in the same asymptotic limit, we have 

\(u, Span{«; ; Q { ^ 9 io })\ 2 0, and \ (v, Span{^ ; 0* ^ 6 io }) \ 2 0, 

and 

(ip flx (p)P n v-u , Span{«i ; t = 9 io }) 0. 

Proof. The result is obtained by following the approach taken in [51 pp. 129-131] for 
proving Theorem 2.10 and accounting for the possibly multi-valued nature of the D~*(-). 

□ 

Theorem 5.7 (Phase transition of vector informativeness). When r = 1, let the sole 
singular value of P n be denoted by 9. Consider j £ {!,...,£} such that 



9 2 



i ( D nx(aj-i), D »x( b t)), and D 'nx( b V = ~°° and D \ 



-oo. 



For each n, let u and v be unit-norm left and right singular vectors of X associated with 
o^ncj-il+i- Then we have that 

(u, ker(9 2 I n - P n P* n )) ^ 0, and (v, ker(9 2 I m - P* n P n )) ^ 0, 



as n — > oo. 



Proof. The result is obtained by following the approach taken in [5j pp. 131] for proving 
Theorem 2.11 and accounting for the possibly multi-valued nature of the D~ x (-). □ 



Theorem |5.1| describes the fundamental limits of eigen-gap based signal detection. 
Principal gap detection will fail whenever 6f < l/D^ftf). If D^ x (b1-) > D^ x {pi) for 
j = 2, . . . ,£ then principal eigen-gap detection will be suboptimal as the middle eigen- 
gaps will reveal the presence of a low-rank signal even when the principal eigen-gap does 



not. Theorem 5.6 shows that whenever there is an eigen-gap, the corresponding singular 



vectors will be informative. Theorem 5.7 provides insight on the fundamental limits of 



low-rank signal matrix estimation. The analog of Proposition 5.4 also applies here. 
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6. Noise models that might produce informative middle components 



Our discussion has brought into sharp focus the pivotal role played by the noise eigen- 
spectrum in determining the relative informativeness of the principal and middle com- 
ponents of the singular value (or eigen) decomposition of signal-plus-noise data matrix 
models as in ([TJ. 

Specifically, we showed that if the noise eigen-spectrum is supported on a single con- 
nected interval then the principal components will indeed (with high probability) be the 
most informative components and their use in detection and estimation is justified. 

However, if the noise eigen-spectrum is supported on multiple intervals, as in Figure 
[8| then the principal components will remain informative in the high SNR regime (i.e., 
large However, for moderate to low SNR, the middle components might also be 
informative and may remain informative even when the principal components are no 
longer informative. In such settings, identifying large middle eigen-gaps and using the 
associated middle eigenvectors for inference can improve inference . 

This leads to a natural question: When will the noise eigen-spectrum exhibit a discon- 
nected spectrum? 

We conclude by identifying a large class of Gaussian mixture models that produce 
precisely such an eigen-spectrum. Consider the class of noise matrices modeled as 

X = GT}'\ 



where G is an m x n matrix with i.i.d. mean zero, variance 1/m (say) Gaussian entries. 
If the rows of X denote spatial measurements and the columns represent temporal mea- 
surements, then £ is a temporal covariance matrix and XX* is a Wishart distributed 
matrix. These models arise in many statistical signal processing and machine learning 
applications where PCA/SVD is often used as the first step in inferential process (see, for 

e.g. [321 EHl E31 EDI HE]). 

Bai and Silverstein characterize the limiting eigenvalue distribution of XX* in [25]. 
What emerges from their analysis [TJ [31] is that for the noise eigen-spectrum of XX* 
to have a disconnected spectrum, the eigenvalue spectrum of S has to have a limiting 
distribution that is supported on disconnected intervals. In addition, the separation 
between these intervals has to be relatively large for the spectrum of X to be supported 
on disconnected intervals. There is no simple formula for how large this separation has 
to be. There are expressions in [29] for the form of the spectrum of X as a function of 
the spectrum S, from which it can be ascertained whether the support is supported on 
multiple intervals on not using the results in [JJ [3T] . Moreover, the spectrum will exhibit 
square-root decay at the edges [30J and so the phase transitions described will manifest. 

Figure [9] plots evolution of the i-ih singular value and Infj = \ (ui,u}\ 2 as a function of 
9 for the model in (jl ) with X = GY}I 2 and S = diag(20J n /io, I n - n /io)- The figure clearly 
shows the phase transition in the informativness of the principal and middle components 
and shows that there is a low SNR regime where the middle component is informative 
even when the principal component is not. The values where the phase transitions occur 



can be theoretically predicted, if so desired, using Theorem 5.5 Figure 2(a) shows a 



sample realization of the singular values for the same setting when 9 = 2. 




Figure 9. The evolution of the informativeness of the principal and the 
middle component as a function of 9 for the model in ([l| and X = GX 1 / 2 , 
where G in an n x m matrix (here n = m = 1000) with i.i.d. mean zero, 
variance 1/m entries and S = diag(20/ n /i , I n -n/w)- The upper panel plots 
Infj = \(v,i,u)\ 2 computed over 250 Monte-Carlo trials. The lower panel 
plots the 2-th largest singular value of X. 

Thus the potential for informative middle components to emerge is the greatest in large, 
heterogeneous datasets where there might be significant temporal (or spatial) variation. 
These might be exploited for extracting additional processing gain beyond what principal 
component analysis might offer. 

Conversely, if the temporal covariance matrix E represents a relatively homogenous (in 
time) data set, then there will be no gap in the eigen-spectrum and the use of principal 
components is justifiably optimal. 

Expanding the range of noise models for which similar predictions can be made is 
a natural next step. It remains an open problem to fully characterize the vanishing 
informativeness of the components of the singular value decomposition associated with 
singular values that (asymptotically) exhibit an o(l) eigen-gap. Additional hypotheses 
on the noise eigen-distribution will likely be required - establishing natural conditions 
for these remains an important line of inquiry. A result along these lines would firmly 
establish that the informative components associated with the singular /eigen values that 
exhibit an eigen-gap are indeed the maximally informative components. 



26 raj rao nadakuditi 

References 

[1] ZD Bai and J.W. Silverstein. No eigenvalues outside the support of the limiting spectral distribution 
of large-dimensional sample covariance matrices. The Annals of Probability, 26(l):316-345, 1998. 

[2] J. Baik, G. Ben Arous, and S. Peche. Phase transition of the largest eigenvalue for nonnull complex 
sample covariance matrices. The Annals of Probability, 33(5):1643-1697, 2005. 

[3] J. Baik and J.W. Silverstein. Eigenvalues of large sample covariance matrices of spiked population 
models. Journal of Multivariate Analysis, 97(6):1382-1408, 2006. 

[4] F. Benaych-Georges and R.R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank per- 
turbations of large random matrices. Advances in Mathematics, 227(1):494-521, 2011. 

[5] F. Benaych-Georges and R.R. Nadakuditi. The singular values and vectors of low rank perturbations 
of large rectangular random matrices. Journal of Multivariate Analysis, pages 120-135, 2012. 

[6] M. Brand. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and 
its applications, 415(l):20-30, 2006. 

[7] S. Dasgupta. Learning mixtures of gaussians. In Foundations of Computer Science, 1999. 40th Annual 
Symposium on, pages 634-644. IEEE, 1999. 

[8] A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. Ap- 
proximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 
292-303, 2006. 

[9] P. Drineas and M.W. Mahoney. On the nystrom method for approximating a gram matrix for 
improved kernel-based learning. The Journal of Machine Learning Research, 6:2153-2175, 2005. 
[10] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 
1(3):211-218, 1936. 

[11] N. El Karoui. Tracy-widom limit for the largest eigenvalue of a large class of complex sample 

covariance matrices. The Annals of Probability, 35(2):663-714, 2007. 
[12] J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer 

Series in Statistics, 2001. 

[13] A. Frieze, R. Kannan, and S. Vempala. Fast monte-carlo algorithms for finding low-rank approxima- 
tions. Journal of the ACM (J ACM), 51(6):1025-1041, 2004. 

[14] N. Halko, P.G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algo- 
rithms for constructing approximate matrix decompositions. SI AM review, 53(2):217-288, 2011. 

[15] Fumio Hiai and Denes Petz. The semicircle law, free random variables and entropy, volume 77 of 
Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2000. 

[16] D. Hsu and S.M. Kakade. Learning gaussian mixture models: Moment methods and spectral decom- 
positions. arXiv preprint arXiv:1206.5766, 2012. 

[17] I.M. Johnstone. On the distribution of the largest eigenvalue in principal components analy- 
sis.(english. Ann. Statist, 29(2):295-327, 2001. 

[18] I.M. Johnstone. High dimensional statistical inference and random matrices. In Proceedings oh the 
International Congress of Mathematicians: Madrid, August 22-30, 2006: invited lectures, pages 
307-333, 2006. 

[19] I. Jolliffe. Principal component analysis. Wiley Online Library, 2005. 

[20] R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. Learning 
Theory, pages 155-199, 2005. 

[21] S. Kritchman and B. Nadler. Determining the number of components in a factor model from limited 
noisy data. Chemometrics and Intelligent Laboratory Systems, 94(l):19-32, 2008. 

[22] S. Kritchman and B. Nadler. Non-parametric detection of the number of signals: hypothesis testing 
and random matrix theory. Signal Processing, IEEE Transactions on, 57(10) :3930-3941, 2009. 

[23] L. Mirsky. Symmetric gauge functions and unitarily invariant norms. The quarterly journal of math- 
ematics, ll(l):50-59, 1960. 

[24] R.R. Nadakuditi and A. Edelman. Sample eigenvalue based detection of high-dimensional signals in 
white noise using relatively few samples. Signal Processing, IEEE Transactions on, 56(7):2625-2638, 
2008. 

[25] B. Nadler. Nonparametric detection of signals by information theoretic criteria: performance analysis 
and an improved estimator. Signal Processing, IEEE Transactions on, 58(5):2746-2756, 2010. 



INFORMATIVE COMPONENT ANALYSIS 



27 



[26] A. Onatski. Determining the number of factors from empirical distribution of eigenvalues. The Review 
of Economics and Statistics, 92(4):1004-1016, 2010. 

[27] D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. 
Statistica Sinica, 17(4):1617, 2007. 

[28] A. Sanjeev and R. Kannan. Learning mixtures of arbitrary gaussians. In Proceedings of the thirty- 
third annual ACM symposium on Theory of computing, pages 247-257. ACM, 2001. 

[29] J.W. Silverstein and ZD Bai. On the empirical distribution of eigenvalues of a class of large dimen- 
sional random matrices. Journal of Multivariate analysis, 54(2):175-192, 1995. 

[30] J.W. Silverstein and S.I. Choi. Analysis of the limiting spectral distribution of large dimensional 
random matrices. Journal of Multivariate Analysis, 54(2):295-309, 1995. 

[31] J.W. Silverstein and P.L. Combettes. Signal detection via spectral theory of large dimensional random 
matrices. Signal Processing, IEEE Transactions on, 40(8):2100-2105, 1992. 

[32] M.E. Tipping and CM. Bishop. Mixtures of probabilistic principal component analyzers. Neural 
computation, ll(2):443-482, 1999. 

[33] S. Vempala and G. Wang. A spectral algorithm for learning mixtures of distributions. In Foundations 
of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on, pages 113-122. 
IEEE, 2002. 

Raj Rao Nadakuditi, Department of Electrical Engineering and Computer Science, 
University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109. USA. 

E-mail address: rajnrao@eecs.umich.edu 

URL: http : //www . eecs . umich . edu/~rajnrao/ 



